Adaptive Probabilistic Policy Reuse

نویسندگان

  • Yann Chevaleyre
  • Aydano Machado
چکیده

Transfer algorithms allow the use of knowledge previously learned on related tasks to speed-up learning of the current task. Recently, many complex reinforcement learning problems have been successfully solved by efficient transfer learners. However, most of these algorithms suffer from a severe flaw: they are implicitly tuned to transfer knowledge between tasks having a given degree of similarity. In other words, if the previous task is very dissimilar (respectively nearly identical) to the current task, then the transfer process might slow down the learning (respectively might be far from optimal speed up). In this paper, we address this specific issue by explicitly optimizing the transfer rate between tasks and answer to the question: “can the transfer rate be accurately optimized, and at what cost?”. In this paper, we show that this optimization problem is related to the continuum bandit problem. Based on this relation, we design an generic adaptive transfer method, which we evaluate on a grid-world task.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Adaptive Capacity Sharing through Probabilistic Controlled Placement

As capacity demands vary among simultaneously executed threads in chip multiprocessors, dynamically managing cache resources according to the run-time demands is effective to improve L2 cache performance. Differed from existing dynamic cache management schemes based on LRU replacement policy, we propose an adaptive capacity sharing mechanism based on a global reuse replacement policy. This mech...

متن کامل

Probabilistic Policy Reuse

We contribute Policy Reuse as a technique to improve a reinforcement learner with guidance from past learned similar policies. Our method relies on using the past policies in a novel way as a probabilistic bias where the learner faces three choices: the exploitation of the ongoing learned policy, the exploration of random unexplored actions, and the exploitation of past policies. We introduce t...

متن کامل

Cache Replacement Policy Using Map-based Adaptive Insertion

In this paper, we propose a map-based adaptive insertion policy (MAIP) for a novel cache replacement. The MAIP estimates the data reuse possibility on the basis of data reuse history. To track data reuse history, the MAIP employs a bitmap data structure, which we call memory access map. The memory access map holds all memory accessed locations in a fixed sized memory area to detect the data reu...

متن کامل

Probabilistic Policy Reuse for inter-task transfer learning

Policy Reuse is a reinforcement learning technique that efficiently learns a new policy by using past similar learned policies. The Policy Reuse learner improves its exploration by probabilistically including the exploitation of those past policies. Policy Reuse was introduced and previously demonstrated its effectiveness in problems with different reward functions in the same state and action ...

متن کامل

Probabilistic Reuse of Past Policies

A past policy provides a bias to guide the exploration of the environment and speed up the learning of a new action policy. The success of this bias depends on whether the past policy is “similar” to the actual policy or not. In this report we describe a new algorithm, PRQ-Learning, that reuses a set of past policies to bias the learning of a new one. The past policies are ranked following a si...

متن کامل

Reusing and Building a Policy Library

Policy Reuse is a method to improve reinforcement learning with the ability to solve multiple tasks by building upon past problem solving experience, as accumulated in a Policy Library. Given a new task, a Policy Reuse learner uses the past policies in the library as a probabilistic bias in its new learning process. We present how the effectiveness of each reuse episode is indicative of the nov...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012